A Dirichlet Process Mixture of Hidden Markov Models for Protein Structure Prediction.

نویسندگان

  • Kristin P Lennox
  • David B Dahl
  • Marina Vannucci
  • Ryan Day
  • Jerry W Tsai
چکیده

By providing new insights into the distribution of a protein's torsion angles, recent statistical models for this data have pointed the way to more efficient methods for protein structure prediction. Most current approaches have concentrated on bivariate models at a single sequence position. There is, however, considerable value in simultaneously modeling angle pairs at multiple sequence positions in a protein. One area of application for such models is in structure prediction for the highly variable loop and turn regions. Such modeling is difficult due to the fact that the number of known protein structures available to estimate these torsion angle distributions is typically small. Furthermore, the data is "sparse" in that not all proteins have angle pairs at each sequence position. We propose a new semiparametric model for the joint distributions of angle pairs at multiple sequence positions. Our model accommodates sparse data by leveraging known information about the behavior of protein secondary structure. We demonstrate our technique by predicting the torsion angles in a loop from the globin fold family. Our results show that a template-based approach can now be successfully extended to modeling the notoriously difficult loop and turn regions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Infinite Hidden Relational Models

Relational learning analyzes the probabilistic constraints between the attributes of entities and relationships. We extend the expressiveness of relational models by introducing for each entity (or object) an infinite-state latent variable as part of a Dirichlet process (DP) mixture model. It can be viewed as a relational generalization of hidden Markov random field. The information propagates ...

متن کامل

Bayesian non-parametric hidden Markov models with applications in genomics

We propose a flexible non-parametric specification of the emission distribution in hidden Markov models and we introduce a novel methodology for carrying out the computations. Whereas current approaches use a finite mixture model, we argue in favour of an infinite mixture model given by a mixture of Dirichlet processes.The computational framework is based on auxiliary variable representations o...

متن کامل

Using Dirichlet Mixture Priors to Derive Hidden Markov Models for Protein Familiesz

A Bayesian method for estimating the amino acid distributions in the states of a hidden Markov model (HMM) for a protein family or the columns of a multiple alignment of that family is introduced. This method uses Dirichlet mixture densities as priors over amino acid distributions. These mixture densities are determined from examination of previously constructed HMMs or multiple alignments. It ...

متن کامل

Expert Finding using discriminative infinite Hidden Markov Model

Process of finding the right expert for a given problem in an organization is becoming feasible. Using web surfing data it is feasible to find advisor who is most likely possessing the desired piece of fine grained knowledge related with given query. Web surfing data is clustered into tasks by using Gaussian Dirichlet process mixture model. In order to mine micro aspects in each task a novel di...

متن کامل

Dirichlet Process Mixture Model with Spatial Constraints

Dirichlet process (DP) provides a nonparametric prior for the mixture model that allows for the automatic detection of the number of hidden states. Recent introduction of variational Bayesian (VB) inference as a deterministic approach makes it practical to large-scale realworld problems. However, the models proposed so far have intrinsic limitations when used on noisy datasets and in situations...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • The annals of applied statistics

دوره 4 2  شماره 

صفحات  -

تاریخ انتشار 2010